Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

نویسندگان

  • Sylvain Arlot
  • Matthieu Lerasle
چکیده

This paper studies V -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for V -fold cross-validation and its bias-corrected version (V -fold penalization). In particular, this result implies that V -fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of V -fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on V like 1 + 4/(V − 1), at least in some particular cases, suggesting that the performance increases much from V = 2 to V = 5 or 10, and then is almost constant. Overall, this can explain the common advice to take V = 5 —at least in our setting and when the computational power is limited—, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter V is replaced by the number B of random splits of the data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

V -fold Cross-validation and V -fold Penalization in Least-squares Density Estimation

This paper studies V -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares risk of the selected estimator. We first prove a non asymptotic oracle inequality for V -fold cross-validation and its bias-corrected version (V -fold penalization), with an upper bound decreasing as...

متن کامل

Appendix to the Article “ Choice of V for V - Fold Cross - Validation in Least - Squares Density Estimation ”

This appendix is organized as follows. The first section (called Section B, for consistency with the numbering of the article) gives complementary computations of variances. Then, results concerning hold-out penalization are detailed in Section D, with the proof of the oracle inequality stated in Section 8.2 (Theorem 12) and an exact computation of the variance. Section E provides complements o...

متن کامل

Robust Cross-Validation Score Functions with Application to Weighted Least Squares Support Vector Machine Function Estimation

In this paper new robust methods for tuning regularization parameters or other tuning parameters of a learning process for non-linear function estimation are proposed: repeated robust cross-validation score functions (repeated-CV Robust V −fold) and a robust generalized cross-validation score function (GCVRobust). Both methods are effective for dealing with outliers and non-Gaussian noise distr...

متن کامل

Semiparametric multivariate density estimation for positive data using copulas

In this paper we estimate density functions for positive multivariate data. We propose a semiparametric approach. The estimator combines gamma kernels or local linear kernels, also called boundary kernels, for the estimation of the marginal densities with parametric copulas to model the dependence. This semiparametric approach is robust both to the well-known boundary bias problem and the curse...

متن کامل

@bullet a Comparison of Cross-validation Techniques in Density Estimation! (comparison in Density Estimation)

• • ~~~~~~ In the setting of nonparametric multivariate density estimation, theorems are established which allow a comparison of the Kullback-Leibler and the Least Squares cross-validation methods of smoothing parameter selection. The family of delta sequence estimators (including kernel, orthogonal series, histogram and histospline estimators) is considered. These theorems also show that eithe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016